Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

LWS-Det: Layer-Wise Search for 1-bit Detectors

169

FIGURE 6.11

An illustration of binarization error in the 3-dimension space. (a) The intersection angle θ

of real-valued weight w and activation a is signiﬁcant. (b) After binarization ( ˆw, ˆa) based

on sign function, the intersection angle ^ˆθ = 0 . (c) ^ˆθ = 0 based on XNOR-Net binarization.

(d) Ideal binarization via angular and amplitude error minimization.

illustrated in Fig. 6.10. As depicted above, the main learning objective (layer-wise binariza-

tion error) is deﬁned as

E =

i=1

∥ai−1 ⊗wi −ai−1 ⊙wi ◦αi∥²

2^,

(6.69)

where N is the number of binarized layers. We then optimize E layer-wise as

argmin

wi,αi

Ei(wi, αi; wi, ai−1, ai−1),

∀i ∈[1, N].

(6.70)

In LWS-Det, we learn Equ. 6.70 by decoupling it into angular loss and amplitude loss, where

we optimize the angular loss by diﬀerentiable binarization search (DBS) and the amplitude

loss by learning the scale factor.

6.4.3

Diﬀerentiable Binarization Search for the 1-Bit Weight

We formulate the binarization task as a diﬀerentiable search problem. Considering that the

1-bit weight is closely related to the angular, as shown in Fig. 6.11, we deﬁne an angular

loss to supervise our search process as

L^Ang

= ∥cosθi −cosθi∥²

= ∥

ai−1 ⊗wi

∥ai−1∥2∥wi∥2

−

ai−1 ⊙wi

∥ˆai−1∥2∥wi∥2

∥²

2^.

(6.71)

For the learning process of the i-th layer, the objective is formulated as

argmin

L^Ang

(wi; ai, wi, ai).

(6.72)